49 research outputs found
Demystifying RCE Vulnerabilities in LLM-Integrated Apps
In recent years, Large Language Models (LLMs) have demonstrated remarkable
potential across various downstream tasks. LLM-integrated frameworks, which
serve as the essential infrastructure, have given rise to many LLM-integrated
web apps. However, some of these frameworks suffer from Remote Code Execution
(RCE) vulnerabilities, allowing attackers to execute arbitrary code on apps'
servers remotely via prompt injections. Despite the severity of these
vulnerabilities, no existing work has been conducted for a systematic
investigation of them. This leaves a great challenge on how to detect
vulnerabilities in frameworks as well as LLM-integrated apps in real-world
scenarios.
To fill this gap, we present two novel strategies, including 1) a static
analysis-based tool called LLMSmith to scan the source code of the framework to
detect potential RCE vulnerabilities and 2) a prompt-based automated testing
approach to verify the vulnerability in LLM-integrated web apps. We discovered
13 vulnerabilities in 6 frameworks, including 12 RCE vulnerabilities and 1
arbitrary file read/write vulnerability. 11 of them are confirmed by the
framework developers, resulting in the assignment of 7 CVE IDs. After testing
51 apps, we found vulnerabilities in 17 apps, 16 of which are vulnerable to RCE
and 1 to SQL injection. We responsibly reported all 17 issues to the
corresponding developers and received acknowledgments. Furthermore, we amplify
the attack impact beyond achieving RCE by allowing attackers to exploit other
app users (e.g. app responses hijacking, user API key leakage) without direct
interaction between the attacker and the victim. Lastly, we propose some
mitigating strategies for improving the security awareness of both framework
and app developers, helping them to mitigate these risks effectively
ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning
Large-scale pre-trained models such as CodeBERT, GraphCodeBERT have earned
widespread attention from both academia and industry. Attributed to the
superior ability in code representation, they have been further applied in
multiple downstream tasks such as clone detection, code search and code
translation. However, it is also observed that these state-of-the-art
pre-trained models are susceptible to adversarial attacks. The performance of
these pre-trained models drops significantly with simple perturbations such as
renaming variable names. This weakness may be inherited by their downstream
models and thereby amplified at an unprecedented scale. To this end, we propose
an approach namely ContraBERT that aims to improve the robustness of
pre-trained models via contrastive learning. Specifically, we design nine kinds
of simple and complex data augmentation operators on the programming language
(PL) and natural language (NL) data to construct different variants.
Furthermore, we continue to train the existing pre-trained models by masked
language modeling (MLM) and contrastive pre-training task on the original
samples with their augmented variants to enhance the robustness of the model.
The extensive experiments demonstrate that ContraBERT can effectively improve
the robustness of the existing pre-trained models. Further study also confirms
that these robustness-enhanced models provide improvements as compared to
original models over four popular downstream tasks
TransRepair: Context-aware Program Repair for Compilation Errors
Automatically fixing compilation errors can greatly raise the productivity of
software development, by guiding the novice or AI programmers to write and
debug code. Recently, learning-based program repair has gained extensive
attention and became the state-of-the-art in practice. But it still leaves
plenty of space for improvement. In this paper, we propose an end-to-end
solution TransRepair to locate the error lines and create the correct
substitute for a C program simultaneously. Superior to the counterpart, our
approach takes into account the context of erroneous code and diagnostic
compilation feedback. Then we devise a Transformer-based neural network to
learn the ways of repair from the erroneous code as well as its context and the
diagnostic feedback. To increase the effectiveness of TransRepair, we summarize
5 types and 74 fine-grained sub-types of compilations errors from two
real-world program datasets and the Internet. Then a program corruption
technique is developed to synthesize a large dataset with 1,821,275 erroneous C
programs. Through the extensive experiments, we demonstrate that TransRepair
outperforms the state-of-the-art in both single repair accuracy and full repair
accuracy. Further analysis sheds light on the strengths and weaknesses in the
contemporary solutions for future improvement.Comment: 11 pages, accepted to ASE '2
Large-Scale Analysis of Framework-Specific Exceptions in Android Apps
Mobile apps have become ubiquitous. For app developers, it is a key priority
to ensure their apps' correctness and reliability. However, many apps still
suffer from occasional to frequent crashes, weakening their competitive edge.
Large-scale, deep analyses of the characteristics of real-world app crashes can
provide useful insights to guide developers, or help improve testing and
analysis tools. However, such studies do not exist -- this paper fills this
gap. Over a four-month long effort, we have collected 16,245 unique exception
traces from 2,486 open-source Android apps, and observed that
framework-specific exceptions account for the majority of these crashes. We
then extensively investigated the 8,243 framework-specific exceptions (which
took six person-months): (1) identifying their characteristics (e.g.,
manifestation locations, common fault categories), (2) evaluating their
manifestation via state-of-the-art bug detection techniques, and (3) reviewing
their fixes. Besides the insights they provide, these findings motivate and
enable follow-up research on mobile apps, such as bug detection, fault
localization and patch generation. In addition, to demonstrate the utility of
our findings, we have optimized Stoat, a dynamic testing tool, and implemented
ExLocator, an exception localization tool, for Android apps. Stoat is able to
quickly uncover three previously-unknown, confirmed/fixed crashes in Gmail and
Google+; ExLocator is capable of precisely locating the root causes of
identified exceptions in real-world apps. Our substantial dataset is made
publicly available to share with and benefit the community.Comment: ICSE'18: the 40th International Conference on Software Engineerin